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ABSTRACT 



Criterion- referenced tests were used to measure the 
learning and retention of a sample of material taught by means of 
programed instruction in the Avionics Fund 'mentals Course, class A«. 

It was found that the students knew about - 0 percent of the material 
before reading the programs, that mastery cose to a very high level 
on the immediate post test, and that about a - 3 i f of the improvement was 
lost by the end of the course (an interval of about 96 days) . There 
was considerable variation in item difficulty by the end of the 
course. Most of this variation was independent of topic difficulty or 
measures of time difficulty obtained from the early posttests. 
Instructors (who were also experienced technicians) were asked to 
indicate the items that were most relevant to subsequent instruction'" 
or to performance on the job. These ratings were not very reliable. 
The indicated items did not differ appreciably from the remaining 
items in terms of student proficiency. It was concluded that if the 
’ nstructors were correct in their ratings, there was enough 

orgetting to hinder a number of students in learning from subs*» *• 

courses and in performing their assigned duties on the job. 
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SUMMARY AND CONCLUSIONS 



Problem 






There are several points within a training sequence at which it would be 
he;lpf ul' t o . 'have .Jahvab splu te me as ure ^ of studeii t\;p,rpf,:^ 

before ^training would indicate ViH© topi c8. : £6n:.Vwhi less 
^oulid^bfeiixeqiuiTCdi :iln; ah rbperatiohal sys temi^i t might . identify : the 
students who are in need of remedial instruction. A measure at the comple- 
tion of training would indicate the parts of the course that were in need of 
revision. A measure at the point of application would indicate the topics 
that were in need of review. These measures cannot be -derived from con- 
ventional norm— referenced tests of the kind used in most schools; they re- 
quire, instead, special criterion— referenced tests . 



Background 



The purpose of the present inves tigation was to explore some of the bene- 
fits .and problems .associated with the use of . criterion-referenced tests in an 
-operational' training situation . The study was based on the material being 
taught in the first phase of the Avionics Fundamentals Course, Class A. All 
materials were "originally taught by. means of programmed instructional book— ^ 
letS's^T;:" V.-‘ " 



Approach 



Proficiency was measured by means of the criterion- referenced tests that 
had ; been used in therrvalidation of the programs. Measures were obtained on a; 
pre^test,’ an^ immediate ^p and • at intervals of one 'day, seven days 

28 fcSays , 96 days, and: three -.years f pi ^ All except 

therlast measure were obtained from the same students; but the particular 
items were counterbalanced so that no student- would take a given item twi.ce. 



Findings, Conclusions , Recommendations 



It was found that students knew about 30% of the material before they 
read the programs, that proficiency reached i very high level on the im- 
mediate post- test, but that about half of this improvement was lost by the 
end of the .course. Over several years on the job, proficiency dropped to a 
level that was less than 20 percentage points above that found on the pre- 
test. By the end of the course there were large differences in proficiency 
on- different items (S .D. 27 percentage points) . These were, reliable dif- 

ferences, but little of tfeevariancecould be attributed either .to general 
topics or to measures of item difficulty . obtained at the earlier testing 
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-Ratings ? wane used- in an effort to 



identify the ' items 'which 




If the instructor's opinions can be taken at face value, then the for- 
- getting found' In this study would be enough to hinder many students In their 
learning of the material taught In subsequent courses and' In their Performance 

on the job,. . . ^ ‘ “ _ V,: 

- -Information- of the kind found In this study should be' very helpful In 

-the design arid control of a training sequence,* but the systematic collec- 
tion of such Information, particularly for a lengthy training sequence, 

■ would be a major undertaking. : 1 ? V' P • : 
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RETENTION OF ELECTRONIC FUNDAMENTALS: DIFFERENCES AMONG TOPICS 



A, Introduction 



The training of an electronics technician usually begins with a unit of 
instruction on electronic fundamentals. The purpose ox this unit is two- 
fold: first, to provide a "vocabulary” of concepts that can be used to 

simplify and facilitate subsequent instruction and, second, to provide a set 
of principles that can be used as a basis for the deduction of proper mainte- 
nance procedures in those situations which have not been covered by specific 
instructions . 

If a system of this kind is to work effectively, it is obvious that 
the material taught in the initial unit must be remembered long enough for 
it to play its/ intended role in the system. Some of it need be retained only 
to the point at which it can provide the basis for subsequent training; the 
remainder must be retained until it can be applied on the job. 

There have been several studies in which the retention of these basic 
concepts and skills was measured at various points within the training se- 
quence and at various intervals after the completion of formal training. 

Most of these studies were done at a time before a clear distinction had been 
drawn between criterion-referenced tests, which are designed to provide an 
absolute index of mastery, and norm-referenced tests, which are designed to 
provide an index of the relative differences between people. (Glaser, 1963). 
Since a large number of multiple-choice items were readily available from 
the norm-referenced tests used during training, it was only natural that 
these tests be used as the primary source of items for the tests of reten- 
tion. These studies reveal a decline in the mastery of basic concepts and 
skills that begins during training and continues out onto the job, but, be- 
cause of the ambiguities inherent in the use of a norm— referenced test as 
an absolute index of mastery, there has been no way to estimate the practical 
significance of this decline. The primary purpose of the present study was 
to investigate the retention of this material through the use of criterion- 
referenced tests. 



B. Procedures 



1. Training Content 



At the time of this study the Avionics Fundamentals (AFU) course, taught 
at the Naval Air Technical Training Center, Memphis, lasted 16 weeks. 

For most students, this course was followed by an eight to 12 week course on 
a particular class of devices (radars, fire-control systems, etc.), and this 
in turn was followed by various courses on the equipment used in specific 
weapons systems. Ic would be helpful to have information on the retention of 
materials taught throughout this sequence, but because of practical con- 
straints, the present study was limited to the first phase of the AFU course. 
This phase lasted five weeks and provided an introduction to basic a-c and 
d-c theory. 





The topics were further restricted to those that had been taught by means 
of programmed instruction. At the time of this study, approximately 40% of 
the five week period was devoted to in-class use of programmed instruction, 
approximately 20% was devoted to laboratory work, and the remaining 40% was 
devoted to lectures, discussions, demonstrations, drills, reviews, tests, 
etc. This restriction was prompted by the fact that the tests used in the 
validation of the programmed booklets provided "exhaus tive" cri terion-re- 
ferenced tests over all training objectives covered by the programs. The 
effect of this restriction was probably fairly slight. The programmed book- 
lets had been designed to provide a reasonably self-sufficient introduction 
to a-c and d-c theory. It is doubtful that this particular mode of in- 
struction had a major effect on the retention data, either. Much of the 
material covered by the programs received further elaboration through other 
media of instruction. In addition, previous studies with these same materials 
and tests (Mayo and Longo , 1964; Longo and Mayo, 1965) have shown that after 
a period of several days the retention of students taught by means of pro- 
grammed booklets is similar to the retention of students taught by more con- 
ventional means. 

The final restriction was prompted by the sheer quantity of material 
that remained. In order to reduce this material to a more manageable level, 
without at the same time changing its quality, all even-numbered programs 
were excluded from the study. This left a total of 19 programs that covered 
215 training objectives. A list of these programs can be found in Appendix A. 

2. Students 



Most of the data were collected from 85 students in a single class who 
were present at each of the major testing points during the course. These 
were students who proceeded through the course at a normal rate, without 
setbacks because of academic deficiencies or accelerations because of academic 
superiority. The original class contained 141 students. Of these, 9 were 
dropped for academic reasons, 7 were removed for administrative reasons, 21 
were set back to later classes for academic reasons, 17 were placed in ac- 
celerated : sections , and 2 were simply absent at one of the major testing 
points. Since a normal graduating class would contain in addition to the 
students who convened with that same class, other students who, because of 
setbacks or accelerations, convened with other classes, the 85 students used 
in this study were not strictly representative of the normal school output. 

The number of graduates lost because of superior performance (17) was fairly 
close to the number of graduates lost because of inferior performance (21), 
however, so the average performance of this group is probably fairly similar 
to that of a normal group, even though the variance of its performance is 
probably smaller. 

Some additional data were collected from a group of 29 technicians who 
had just returned from the fleet in order t a attend an advanced course in 
avionics (AVIB) . All of these technicians had attended the AFU school at 
some time in the past. The median interval between graduation from the AFU 
school and testing was about three years. There are a variety of selective 
factors that might have affected the quality of this second sample, but the 
available evidence indicates an overall effect that is fairly small. Table 
1 contains background data on both samples. 
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TABLE 1 

Background Data on AFU and AVIB Samples 



Sample 


ETST 


GCT 


ARI 


AFU Final Average 


AFU 


65.2 


64.4 


61.5 


80.3 


AVIB 


63.9 


61.0 


61.3 


78.2 


Diff 


1.3 


3.4 


.2 


2.1 



3. Test Items 



The retention tests, as was mentioned earlier, were made up of the cri- 
terion tests that had been used to validate the programs. A given training 
objective was generally covered by a single item, though many of these items 
were actually a composite of several fairly distinct problems. A student 
might, for example, be required to calculate several circuit values, or to 
transform several values from one set of units to another. Most of the pro- 
blems required a written answer of some kind, though a few of them were in a 
multiple-choice or matching format. Some of the original items were modi- 
fied slightly in order to clarify their meaning when viewed in isolation. 

The specific values used in the problems were changed whenever this could be 
done without an obvious effect on the difficulty or the essential content of 
the item, since it was feared that without such changes the students might 
tend to respond on the basis of rote memory instead of the intended calcu- 
lations • 

4. Testing Plan * 

Each of the criterion tests covered, on the average, 11.5 training ob- 
jectives. Each of these tests was broken down into six sets of items that 
were as closely matched as possible. For the single test that contained 
fewer than six items, it was possible to split composite items so as to form 
the required number of sets. 

The initial class of 141 students was broken down alphabetically into 
six groups of roughly the same size. These initial groupings were retained 
throughout the course, but losses of various kinds reduced their sizes. By 
the end of the course, there were from 12 to 17 students in each of the six 
groups. 

The items associated with a given program were given to students at each 
of six points. The first of these was immediately prior to the administra- • 
tion of the, program. The second was at the end of the classroom time allot- 
ted to the program. The next three were 1 day, 7 days, and 28 days after 
the completion of the program, respectively. The final point was at the end 
of the course, which was, on the average, 96 days after the completion of the 
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For each program, the six item sets, six student groups, and six test 
ine points formed a Latin square. Each of the item sets and student groups 
ias represented once and only once at each of the testing points, and no 
student repeated an item he had encountered previously. 

Since there was no problem of retesting "iththe AVIB “odents.each 
student was given half the item sets. This provided data on from « “ 15 
students per item at this point, roughly the same amount of data that was 
available** at each of the testing points used with the AFU students. 

5. Test Scoring 

For many of the items the answers were clearly right or wrong. For others, 
however there were varying degrees of "rightness." In most cases, the 
scoring ’standlrdswe^e Lilly lenient. Partial credits were given when a 
student missed only part of a composite item or when he indicated a sub 
sL^Ll talwlelge o? Le correct response. They were given on computational 
!I“£ms,Lf Simple, when the error could be tgjcjd ... 

arithmetic or in the designation of units, even though errors of this kind 
could be quite serious on the job. 

All items were scored by a single individual. For the AFU students, all 
responses to a given item were scored at the same time, without knowledge of 
the student group or the testing point from which a given response had ee 
ohi-a-tned The items from the AVIB sample were scored at a later date, but 
aaaiist ^ background provided by the ite,s from the AFU sample. Most of the 
responses usedby the AVIB sample had also been used by the AFU sample, so 
there was little rcom for bias. 



C. Results 

The results will be broken down into three sections. The first describes 
average student performance at each of the seven testing points. The second 

describes an attempt to draw finer distinctions among - ««ed ThT final 
the uses that will be made of the knowledge or skill being tested. ine rinar 
CCCtLC dCs“ibes the relationships between performance on various items or 
topics at different points following the initial instruction. 

1. Overall Retention 

Since all six student groups from the AFU course took all 218 items , it 
nossible to compute a score for each student by averaging acr 
Items ^a^ragTof these individual, scores for the six student groups were 
. 69 , . 69 , - 71 , . 69 , • 69 , and . 69 , a remarkably close agreement. 

SS.‘ S It^lIelLSnhe^L^ofL^nts 

Ih! were Completely right on the items in question, even though, because of 
the partial credits, this is not strictly the case. 
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TESTING POINTS 




PER CENT ITEMS 

Figure 1. Distributions of Item Difficulties at Seven Testing Points. 



An examination of the first testing point indicates that there was a 
moderate amount of this material known by the students before they took the 
programs. If it could be assumed that equal amounts of time were spent on 
each objective, then almost 30% of the total training time would have been 
spent in teaching students things that they already knew. There are. several 
reasons for rejecting this assumption, however. First, there was probably 
less material devoted to the objectives with which the student might be ex- 
pected to have some familiarity. Second, branches that would permit the 
student to bypass the material that he already knew were available in several 
of the programs. Finally, the student could adjust his own pace to some ex- 
tent by skimming over the material with which he was already familiar and 
slowing down for the material that was new. 

It should be remembered that the data at this point represent the stu- 
dents 1 knowledge immediately before taking the various programs and not their 
knowledge at the beginning of the course. Many objectives were covered in 
home study assignments, and some objectives in the later programs were intro- 
duced by programs earlier in the sequence. 

The data of the second testing point indicate a high degree of mastery, 
with roughly half the items being answered without a single error. The 
average, over all Items, is 90%. Even so, this is an underestimate of the 
actual maximum, since some of the students had not completed the programs at 
this time (those who did not finish a program within the allotted time 
finished it during later class periods or in the evening after school). In 
fact, as a result of these non-completions, only 58% of the students demon- 
strated mastery of 90% of the items tested at the second testing point, 
whereas, more than 90% of the students used in the original validation of 
the programs, all of whom were allowed to complete the programs, reached this 
same criterion. If the students in the present study had been tested at the 
actual completion of the programs, their average score would probably have 
been several percentage points higher than that which was found at the second 
testing point. 

Following the second testing point the highly skewed distribution of 
items becomes flatter and flatter, until, by the seventh testing point, it is 
an almost rectangular distribution extending from 0% to 100%. If the mid- 
points of these distributions were plotted against time, they would form a 
negatively accelerated function that resembles the classic curve of for- 
getting. It should be remembered, however, that these data were gathered 
from a task that differs considerably from the usual laboratory task. The 
intervals are filled with learning activities that provide massive opportuni- 
ties for both positive and negative transfer. For many of the items there 
is a great deal of direct rehearsal and practice. 

By the end of the course, the scores of the AFU students have dropped 
about half the distance between their highest level and their original level. 
The scores made by the AVIB students are less than 20 percentage points above 
the scores made by the AFU students on the pre-test. An examination of the 
correlations between student characteristics and proficiency indicated that 
none of the adjustments for differences between the two samples would in- 
crease the scores at the seventh testing point by more than 1.8 percentage 
points . 



2 . Specific Retention 



There is obviously a considerable amount of forgetting that takes place 
by the end of the course, but its practical importance cannot be determined 
without a consideration of the individual items involved. A good deal of 
the material taught in this initial segment is taught purely as an aid to the 
learning of additional material. If this additional learning takes place 
within the AFU course itself, then there is no reason why the original 
material cannot be forgotten without any real loss to the student. 

^■ n order to identify the material that should not be forgotten, the 
various item sets were submitted to instructors from the AFU course, who 
were asked to indicate, for each item, whether the knowledge covered would 
be needed on the job and/or in the various courses which follow the AFU 
course. A copy of these instructions can be found in Appendix B. The in- 
structors had all been through the training sequence at some time in the 
past and had served at least one tour of duty in operating units. Each of 
the six item sets was evaluated by nine instructors, but, in general, they were 
not very reliable. A summary of these ratings can be found in Appendix C. 

The most reliable index was obtained from the ratings of relevance to 

job* The instructors endorsed about 43% of the items, the average cor- 
relation between instructors was .31, and the overall reliability was .81. 

The items that had been endorsed by at least seven of the nine instructors 
were selected for further analysis . There were 44 such items , representing 
about 20% of the total pool. Performance on these items has been graphed, in 
Figure 2. It can be seen that the students were somewhat more proficient on 
these items than they were on the remaining items. By the seventh testing 
point, the difference had increased to about 20 percentage points. This dif- 
ference provides some substantiation for the instructors' judgements, since 
one would expect that the use of this information on the job would maintain 
proficiency at a level above that found for the less frequently used infor- 
mation. The absolute level of proficiency, however, is rather disappointing. 
If this information is "really needed on the job," as was stated by the in- 
structors, then it would not seem unreasonable, to expect a level of pro- 
ficiency approaching 100%; instead, it was found to be 65%. 

n Since the instructors' ratings may have been based on the assumption that 
the job would entail maintenance at the level of individual components, 
part of the deficiencies noted above might be attributed to the fact that 
many of the students in the AVIB sample had little or no experience on this 
kind of job. In order to check on this possibility, the AVIB school sample 
was divided into two groups of roughly the same size. The first group had 
had more than six months experience on jobs that required maintenance at the 
level of individual components. Most of this experience was in Intermediate 
Maintenance Activities. The second group had had six months or less ex- 
perience in this type of work. A good deal of their time had been spent in 
Organizational Maintenance, though -some technicians worked at jobs that were 
almost completely divorced from normal maintenance activities. 

two groups differed by less than three percentage points on total 
scores. They differed by less than seven percentage points on the "needed" 
items, a somewhat larger difference, but hardly one that could be used to 

the deficiencies noted earlier. 
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A second analysis was made of the 60 items that had been endorsed by at 
least seven of the nine instructors as being required for subsequent train- 
ing, The average instructor endorsed 59% of the items, the average correla- 
tion between instructors was .12, and the overall reliability was .53. Per- 
formance on these items was quite similar to performance on the remaining 
items. In fact, mean performance on these items was from one to three per- 
centage points below mean performance for all items at each of the seven 
testing points. 

Another analysis was performed on a set of 13 computational items from 
the areas indicated in Table 2. It was felt that these items would provide 
a fairly objective benchmark for the remaining items, since the content of 
computational items can be readily specified and both the questions and 
answers are free from the ambiguities encountered in some of the more purely 
verbal items. These particular items were selected from a larger set of 
computational items because the skills involved, from an admittedly subjective 
point of view, appeared to be the most basic. 



TABLE 2 



Areas Covered by Computational Items 



Area Tested 


Number 
of Items 


Conversions from one metric prefix to another 
(e.g. ... 20 ma = ,... T ... ; .j*a) ...... ... 


2 


Solution of simple problems using metric prefixes 
(e . e . , 100 kv r 10 ma = 10 SI ) 


2 


Calculation of voltage, current, power, or resistance 
in various parts of simple (2 to 3 resistors) resistive 
parallel circuits (e.g., if total current is 3a, and both 
R-. and R 2 are 20 XL , what is the total power being 
dissipated?) 


6 


Calculation of voltage, current, or impedance transmitted 
across a transformer (e.g., with 60 turns in primary, 100 
turns in secondary, and 120 v applied to primary, what 
voltage is induced in the secondary?) 


3 



As can be seen in Figure 2, performance on these items was also quite 
similar to performance on the remaining items. The scores of the AVIB stu- 
dents who were more experienced in component level maintenance exceeded those 
of the less experienced students by about 13%. A more detailed discussion 
of computational skills can be found in Appendix D. 
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The last of the separate analyses was made on a group of ten composite 
items, all of which required the technician to indicate the effect (i *e. , in- 
crease, decrease, or remain the same) that certain changes in simple RCL cir- 
cuits (i.e.an increase or decrease in capacitance , inductance , resistance , or 
frequency) wouid have on certain aspects of circuit operation (e . g. ,X^ , Z t ,I C , 
E^) . It was found that the average score at the sixth testing point was 63% 
and that the average score at the seventh testing point was 33%. The latter 
is exactly what would have been expected on the basis of guessing alone. 

3. The Measurement of Mastery 

Much of the recent interest in criterion-referenced tests has centered on 
their use as a means for providing quality control in instructional systems. 

It has been recognized that certain of the traditional psychometric considera- 
tions are not relevant in such applications, but there is very little data 
available on the characteristics that are relevant. 

The scores from the current study were analyzed so as to provide informa- 
tion on the reliability of the criterion-referenced tests and on the extent 
to which tests administered at various points in time agree with one another. 
Reliability, as used here, refers to the tests' ability to make reliable 
discriminations among various training objectives or lessons, rather than to 
their ability tc make reliable discrimination among students. Similarly, 
agreement across testing points 1 is measured in terms of items or topics 
rather than students . 

a. Single Training Objectives . The reliability of the tests at each of 

the testing points was estimated separately for each of the student groups. 

In each case the estimate was based J on a score matrix that was approximately 
36 (items) by 14 (students). The final estimate for a given point was then 
calculated by taking a weighted average, using Fisher's z transformation, 
over the six individual estimates; 1 These final estimates have been placed, 
in parentheses, along the principal diagonal of Table 3i The variations in 
reliability follow fairly closely the variations in the standard deviations 
of item scores, which can be found in the -last column of the table. 

The correlations between" testing points were based on the average item 
scores at each point; there was no division into student groups. Such a pro- 
cedure will treat any differences between student groups as error, but as was 
stated earlier, there was very little difference between the six groups from 
the AFU school. A similar check could not be made on the six groups from 
the AVIB school, so it was simply assumed that they, too, would be fairly 
similar to one another. These correlations can be found above the principal 
diagonal in Table 3. 

Since there were not very many students per item, a deficiency that could 
be readily corrected with additional testing, the correlations between test 
points were corrected for attenuation. The corrected coefficients can be 
found below 1 the principal diagonal in Table 3. 

If the first testing-point is excluded, the rest of the matrix falls into 
a generally simplicial form. Test© "that are close to one another in time are 
more highly correlated than those ! whi ch are more widely separated in time. 

O The correlations with the pre-test, on the other hand, uend to increase with 
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TABLE 3 

Reliabilities and Intercorrelations for 
Seven Testing Points: Objectives 



Testing 

Points 


Testing Points 


S.D. 


i 


2 


3 ••• 


4 


5 


6 


7 


i 


(.90) 


.21 


.32 


.32 


.40 


.44 


.54 


•27 


2 


.27 


(.69) 


.60 


.40 


.39 


.32 


.31 


.13 


3 


.41 


.89 


(.66) 


.59 


.53 


.47 


.42 


.13 


4 


.37 ' 


.53 


.79 


(-85) 


.67 


.60 


.53 


.22 


5 


.46 


.51 


.70 


.79 


(.85) 


.74 


.64 


.23 


6 


.49 


.40 


.61 


.69 


.85 


(.89) 


.73 


.27 


7 


.61 


. 40 


.55 


.61 


.74 


.82 


(.89) 


.28 





















increasing separations from the point of original learning. Both patterns 
hold for the corrected matrix as well as for the original matrix. 

The most .important finding, from a practical-point of view, is that per- 
formance on the ear ly post— tests does not provide a very powerful basis for 
predicting per f b rmaii ce ■ at 0 tfee. more ctelayeci - ; tesbing^ ^pbirits . The highest of > 
these correlations , that between testing point 3 and testing point 6, ac- 
counts for ,, only 37% of the * L f variance in the. delayed test , even after the cor- 
rection for at tenuation. .*.-7 .0 * V-' \ ' 

b. Lessons , y In order to estimate the reliability of the tests in 
ordering the 19 lessons, the following procedures were followed at each test- 
ing point. First, an average score was computed for each item by averaging 
over students. Next , an average score for each lesson was computed by 
averaging over the item scores for each lesson in a given set of items. This 
provided six estimates (one for each item set) of lesson difficulty. These 
were analyzed as a 6 (item sets) by 19 (lessons) score matrix in order to 
provide an estimate ;p£ reliability at that particular testing point. These 
estimates can be found ^ in parentheses ,/ along the principal diagonal of 
Table 4. The reliabilities are again associated with the standard deviations, 
but the degree of reliability is not as high as it was for the individual 
items . 



The correlations between testing points can be found above the principal 
diagonal in Tab le 4 . The lesson scores used in this analysis were computed 

given. lesson at a given testing point. 



b V 



- averaging, over -all item scores for 
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TABLE 4 



Reliabilities and Intercorrelations for 
Seven Testing Points: Lessons 



Testing 

Points 


Testing Points 


S . D . 


1 2 3 4 5 6 7 


i 


(.71) -.04 .19 .21 .39 -47 .36 


.15 


2 


(.61) .61 .09 .25 .15 .29 


.06 


3 


(.34) .50 .72 .68 .68 


.04 


4 


(.63) .83 .78 .49 


.10 


5 


(.82) .91 .72 


.14 


6 


(.64) .71 


.13 


7 


(.72) 


.14 



These coefficients were not corrected for attenuation. The reliability co— 
efficients, as computed here, are affected by "errors" in the sampling of 
both students and items, so the practical interpretation of such a correction 
would not be as obvious as it was in the previous analysis. It might be 
rioted; also ; that the internal consistency model used in computing these re- 
liabilities has provided estimates that , in several cases , are far below the 
reliabilities which actually limit the intercorrelations. 

The general pattern of correlations is similar to that found for the in- 
dividual items: the post-tests fall into a fairly simplicial pattern, and 
the pre-test tends to be more highly related to the late post-tests than to 
the early post-tests. The pattern here does contain more irregularities, 
however. " 

The lesson means do not provide a very powerful method for locating weak 
items. In fact, the lesson means account for only 27%, 16%, 9%, 17%, 31%, 
18%, and 24% of the variance of individual items for testing points 1 through 
7, respectively. Any system of review that allocated effort purely on the 
basis of lesson or topic difficulty would allocate a good deal of time to 
items .that Sid not rieed 'review and completely miss a number of items that did 
need review^-' A '’‘'“v" yr- '“7 ; 



D. Discussion 

- The results of this study, in which the retention of electronic funda- 
O nerifalsv’was : %easured by me'aris of criterion— referenced tes ts do not differ 



appreciably from the results of previous studies in which retention was 
measured by means of norm-referenced tests. Retention in the interval be- 
tween two weeks and two months, for example, was about nine percentage points 
higher in thiB study than it was for similar material in a study by Wickens # 
Stone, and Highland (1952). The loss in retention over a period of three 
years following the completion of the course was almost identical to that 
found by Williams and Whitmore (1959) for a test on "basic electronics." 

This correspondence is largely accidental, however, since it would be quite 
possible to design good norm-referenced tests which would provide measures of 
retention that range from one end of the scale to the other. 

It was found that about half the gain in scores had been lost by the end 
of the course, and that this decline continued, at a diminishing rate, over 
several years on the job. This does not mean that the students "forgot" half 
of what they learned, however. Any index of retention is highly dependent 
upon the particular measurement technique that is used. In the current study 
retention was measured by means of unaided recall, since it was felt that 
this would provide the best estimate of the information and skills that would 
actually be available for use on the job. Had some other technique been used 
for example, recognition or relearning, the amount of measured loss would 
probably have been smaller. 

An effort was made to use instructor ratings as a technique for dis- 
covering the topics which, if forgotten, would most adversely affect subse- 
quent learning or performance on the job. It was found that there was very 
little agreement on these ratings. Since the agreement that was found was 
probably inflated to some extent by invalid stereotypes , caution would seem 
to be in order when using information of this kind as a basis for the design 
of training sequences. 

It was found that the topics rated as being most relevant to the job 
were remembered somewhat better than the remaining topics, but that 35 per- 
cent of this "important* material had been forgotten by the technician on 
the job. This forgetting might be explained in part by ambiguities or un- 
important particulars which, in this particular set of items, were in- 
extricably confounded with a more basic set of principles that are remembered 
On its face, however, the data suggeBt either an extreme heterogeneity of 
jobs, a lack of validity in the judgement of relevance, or a less than opti- 
mal level of performance by technicians on the job. 

The programmed booklets used in this study had all been validated 
against immediate post-tests that covered two to three hours of instruction. 
These tests play a vital role in detecting deficiencies in the programs, but 
they do not provide an adequate means for controlling the proficiency of 
course graduates. The data from testing point 6 indicate that almost a 
third of the items are being missed by a majority of the graduates. 



The obvious solution to this problem is some kind of review procedure 
(beyond the reviews that are currently being used in the course) , but the 
problems involved in such an approach are more imposing than one might 
think. An extrapolation from the materials used in this study suggests 
that the course as a whole would comprise about 1500 separate training ob- 
jectives. The objectives covered in the present sample were generally 
O 
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tested against a single "item," but a number of these "items" were composed 
of several relatively independent problems. Thus, a testing of the course 
as a whole would probably require in excess of 3000 hand— scored problems » 

It is unlikely that the difficulties imposed by such a mammoth testing 
program could be ameliorated to any great extent by sampling from general 
topic areas, since the current study indicates that topic areas can account 
for no more than a small percentage of the variance in individual items • 

The most promising approach to simplification is through a culling of ob- 
jectives which are of minor importance or which have already served their 
intended purpose, in spite of the difficulties encountered in trying to do 
this for the objectives covered in this study. 
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APPENDIX A 



LIST OF PROGRAMS 



1. Elements of Electrical Physics - Matter 

2. Elements of Electrical Physics - Dynamics 

3. Elements of Electrical Physics - Conductors, Resistors, Insulators 

4. Electrical Calculations - Conversion of Electrical Units 

5. Electrical Calculations - Work, Power, and Energy (Electrical) 

6. D. C. Circuits - Parallel Circuits 

7. Magnetic Theory - Magnetism 

8. D. C. Meters - Meter Movements and Scales 

9. D. C. Meters — Voltmeters 

10. D. C. Meters — Multimeters 

11. Electromagnetic Devices - Generators 

12. A. C. Theory - Generation of a Sine Wave 

13. Reactive Circuits - Inductance 

14. Reactive Circuits - Transformers 

15. Reactive Circuits - Capacitance 

16. A. C. Circuit Characteristics 

17. Parallel A. C. Circuits 

18. Introduction to Vacuum Tubes 
Voltage Regulation and VR Tubes 
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APPENDIX B 



INSTRUCTIONS FOR RATING ITEMS 



As part of a study on the retention of technical material, the Review 
Tests that are used with 19 of the programmed instruction booklets from Phase 
I were given at various points during the course. These tests contain a 
total of 218 items. 

The forgetting of some parts of this material would be far less serious 
than the forgetting of other parts. Some parts, for example, are taught 
strictly to develop a given concept; once this concept has been mastered the 
original material can be forgotten without serious loss to the technician. 
Other parts, however, will be used on the job or in learning the materials 
that will be taught in subsequent courses; the technician should remember 
this material, I would like your help in identifying the materials that 
will be most needed at some later date. 

Assume that you could provide review on an individual basis at the comple 
tion of the AFU(A) Course. In other words, you can test each student and, 
if he misses a given item, can provide him with a review of that material 
without imposing the same review on all other members of the class. Place a 
check (v" ) in front of each item for which, if the student missed it, you 
would provide review. Do not assume that everyone will know the easy items. 

Below the check mark, print a "J" if you feel that this information is 
really needed on the job. If you feel that a student who did not possess 
this information would be serious ly hindered on the job, print a after 

the n J n . In making these judgements, remember that you should be concerned 
with the actual requirements of the job, not the intelligence of the student. 
If a technician does not know the capital of the United States you would 
probably not want him working on your plane, but he does not need this in- 
formation to do a good job. 

If you feel that a given item reflects information that the student will 
really need if he is going to learn the material that will be taught at some 
time after the completion of the AFU(A) Course, print a "T" under either the 
check or the "j" • 

You may do this rating at your leisure, but please do not discuss the 
job with other raters. The following table summarizes the codes: 

If a given item is missed: 

V* It should be reviewed at the end of AFU(A). 

J The technician will be hindered on the job. 

J4- The technician will be seriously hindered on the job. 

T The technician will be hindered in the courses that follow AFU(A). 




APPENDIX C 



INSTRUCTORS RATINGS OF ITEMS 



In order to reduce the amount of rating required of each judge, each of 
the six item sets was submitted to a different group of nine judges. Each 
group consisted of three instructors from each of the three phases of the AFU 
course. The reliabilities of the ratings were estimated by calculating sepa- 
rate reliabilities for each group and then computing an average by means of 
Fisher’s z, across the six groups. For the ratings of relevance to the job, 
a J was scored as 1 and a J4-, as 2. These reliabilities, together with 
average correlations between judges and average percentages of endorsement, 
can be found in Table 5. 



TABLE 5 



Ratings of 


Relevance to Job 


or to Subsequent 


Training 


Rating 


Reliability 
9 Judges 


Avg. 

Correlation 


Avg. % 
Endorsed 


Needs Review 


.50 


.12 


48 


Needed on Job 


a 81 


.31 


43 


Needed for Training 


.53 


. 12 


59 


Corrected Review 


.65 


.18 


55 


Summary 


.77 


.28 





It was assumed that the Review response would serve as a totally redun- 
dant summary for the Job and Training responses, but, as can be seen in Table 
5, this was not the case. More items were endorsed as being required for 
subsequent training than were checked as needing review, and, even though it 
cannot be seen from the summary data, some items were endorsed as J+ without 
being checked for Review. In order to obtain a more consistent index, a Cor- 
rected Review response was created by assuming a positive response to each 
item that was checked for Review, or was endorsed as J+, or was endorsed as 
both J and T. A final Summary index was computed by assigning a value of 1 
to Review checks, J endorsements, and T endorsements, a value of 2 to J+ 
endorsements, and then summing across categories. 

There were just about as many J+ endorsements as J endorsements, but 
several judges confined themselves exclusively to one or the other. As a re- 
sult, the use of three response categories on this variable did not have as 
great an effect on reliability as one might think. If all positive responses, 
" u g f her J or J+, are scored the same, the reliability is still .78. 




APPENDIX D 



THE ROLE OF COMPUTATIONS IN ELECTRONIC MAINTENANCE 



The early courses in electronics were designed to provide the technician 
with the skills that he would need in troubleshooting a piece of equipment 
with little information beyond that provided on a sparsely annotated schematic 
diagram. Among these skills was the ability to compute the readings that 
should be obtained from various test points, and to determine the way in 
which a signal would be influenced by various kinds of malfunctions. Over 
the years, however, more and more of this information that was once available 
only through computations has been provided directly in the various documents 
available to the technician. Althoxigh time devoted to instruction and drill 
in computations has been reduced, the general outline of the electronics 
courses does not appear to have been affected to any great extent by these 
changes . 

The computational skills being taught in the current courses are generally 
viewed as "enabling" skills. In other words, they are taught so as to faci- 
litate the learning of other, more job oriented skills. This view is re- 
flected in the fact that, of the several dozen computational items tested, 
only two were included in the set of 44 items that the instructors considered 
most relevant to the job. Nevertheless, almost all of the computational 
items were endorsed by some instructors. The 13 items in the "basic" set 
were endorsed, on the average, by five out of nine judges. An additional 20 
items related specifically to a— c circuit theory, were endorsed, on the 
average, by two out of nine judges. 

Because of this belief by some judges that the computations would be 
needed on the job, an effort was made to estimate the level of performance 
that could be expected on tasks that required such computations. One of the 
first problems was to identify these tasks, and this proved to be rather dif- 
ficult. The most frequently mentioned tasks included such things as the 
maintenance of transient aircraft for which the usual maintenance information 
was not locally available, or the modification of circuits when, because of 
some emergency condition, the required replacement parts were not available; 
but no one was particularly happy with these examples. In any case, it ap- 
peared that most of the tasks would require the successful performance of, 
not one, but several computations. 

In order to obtain data that could be used to estimate probability of 
success for tasks requiring various numbers of computations, the 13 basic 
computational problems were abstracted from the regular tests and administered 
to a new sample of 21, high-experience, AVIB students. The average score on 
these items was 61%, a somewhat higher score than that made by similar stu- 
dents in the original sample. For purposes of the present analysis, however, 
the tests were rescored, counting each problem as a separate item and giving 
no credit for partial solutions. This resulted in a total of 23 items, al- 
most half of which required no more than the proper manipulation of metric 
prefixes. Each student’s score was expressed as a percentage, and these per- 
centages were raised to successive powers in order to estimate probability 




of success for various numbers of computations. These estimates were then 
averaged over the 21 students. It was found that the probabilities of suc- 
cess for tasks requiring from 1 through 6 computations were 63%, 44%, 33%, 

26%, 21%, and 17%* respectively. 

Even though these figures indicate a fairly low probability of success, 
they represent a very inflated estimate of the proficiency one might actually 
expect to find on the job. These estimates are based on relatively easy 
computations, whereas most of the tasks suggested in the interviews would re- 
quire the more difficult computations associated with a-c circuits and ampli- 
fying devices. It was found that the AVIB students had a 26% probability of 
success for trigonometric computations of the kind required for a-c circuit 
work, and that this probability dropped to 12% when these skills were applied 
to representative a-c circuit problems. Had problems of this kind been used 
as a basis for the estimates, one would have concluded that an experienced 
M A n school graduate has essentially no chance of working his way through a 
task that requires as many as three or four of these computations. 

As was noted earlier, most judges viewed the computational skills as 
an aid to further training. In fact, 21 of the computational items were en- 
dorsed by at least seven out of nine judges as being "really needed" in the 
courses that follow the AFU course. The average score on these 21 items at 
the 6th testing point (the end of the AFU course) was 61%. If the judges are 
correct in stating that a student will be hindered in subsequent courses if 
he does not possess these skills, then a number of students are being hindered. 
It might be profitable to provide the poorer students with additional practice 
in computation, or, alternatively, to investigate training procedures that 
are not so vitally dependent upon those computational skills. 
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